Setting the options for knitr…
library(knitr)
knitr::opts_chunk$set(echo = TRUE,
comment = NA,
prompt = FALSE,
cache = FALSE,
warning = FALSE,
message = FALSE,
fig.align="center",
fig.width = 8.125,
out.width = "100%",
fig.path = "D:/figures/eL2/Part2-lowres/",
dev = c('png', 'tiff'),
dev.args = list(png = list(type = "cairo"), tiff = list(compression = 'lzw')),
dpi=150,
cache=TRUE,
cache.path = "D:/cache/eL2/Part2-lowres/",
autodep=TRUE)
options(width = 1000, scipen = 999, knitr.kable.NA = '')Setting the options for summarytools…
library(summarytools)
st_options(plain.ascii = FALSE, # Always use this option in Rmd documents
style = "rmarkdown", # Always use this option in Rmd documents
footnote = NA, # Makes html-rendered results more concise
subtitle.emphasis = FALSE) # Improves layout with some rmarkdown themes
st_css()Loading libraries…
library(tidyverse)
library(ggplot2) # Figures
library(gridExtra)
library(kableExtra)
library(lme4) # For lmer and lme
library(lmerTest) # with p-values in the summary() and anova() functions
library(emmeans) # multiple comparisons and plotting effects
library(FSA) # For Dunn's test
library(stats) # For Fligner-Killeen's test
library(DHARMa) # simulate residuals
library(sjPlot) # Plotting effects
library(glmmTMB)
library(stargazer)Cleaning the memory:
rm(list = ls(all.names = TRUE))Specifying a seed for random number generation, for the sake of reproducibility:
set.seed(123)We load what we saved previously:
load(file = "model.final.RData")As a reminder, our model is a logistic regression, with a number of fixed effects, which include several interactions, and a random-effects structure which consists of i) random intercepts for nonword and L1 and ii) a random categorical slope for subject, with three sets of values for the three possibles values of occurrence_l (whether there is no l in the nonword to repeat, an l internal coda position or an l in final position).
In order to trust the outputs of our model, we need to make sure that the assumptions underlying a mixed-effects logistic regression are met. While these assumptions are not as clear as for linear regression, the following conditions matter: - The normality of the different distributions for the random effects - The correct distribution of the residuals of the model
(Note that we have already checked that our model is not singular)
p <- plot_model(model.final, type = "diag", show.values = TRUE, value.offset = .3)
p$nonword + labs(title = "Random intercepts for nonword")p$L1 + labs(title = "Random intercepts for L1")p$subject + labs(title = "Random slopes for subject", subtitle = "occurrence_lcoda: l in internal coda position -- occurrence_lfinal: l in final position -- occurrence_lother: other nonwords") + theme_sjplot()
The quantile-quantile plots of the various random effects indicate that
their levels follow a distribution which is quite close to a Gaussian
distribution.
There is no expected normality of the model residuals with a logistic
regression, but we can simulate residuals with the function
simulateResiduals() of the DHARMa
package. According to the authors of the package: ‘Even experienced
statistical analysts currently have few options to diagnose
misspecification problems in GLM. DHARMa package aims at
solving these problems by creating readily interpretable residuals for
generalized linear models that are standardized to values between 0 and
1, and that can be interpreted as intuitively as residuals for the
linear model. This is achieved by a simulation-based approach that
transforms the residuals to a standardized scale. The key idea is that,
if the model is correctly specified, then the observed data should look
like as if it was created from the fitted model. Hence, for a correctly
specified model, all values of the cumulative distribution should appear
with equal probability. That means we expect the distribution of the
residuals to be flat, regardless of the binomial model structure.’ —
Source: DHARMa:
residual diagnostics for hierarchical (multi-level/mixed) regression
models
sim.residuals <- simulateResiduals(fittedModel = model.final, n = 1000, refit = FALSE)
plotSimulatedResiduals(simulationOutput = sim.residuals)According to the graphics, the simulated residuals do not reveal any misspecification of the model - they are uniformly and homogeneously distributed with respect to the model predictions (right graphic), and they also appear to be linearly distributed.
We can also inspect the residuals not with respect to the model predictions, but to the different main fixed effects.
plotResiduals(sim.residuals, form = df$occurrence_l, xlab = "Occurrence of l", ylab = "Simulated Standardized Residuals")plotResiduals(sim.residuals, form = df$branching_onset, xlab = "branching_onset", ylab = "Simulated Standardized Residuals")plotResiduals(sim.residuals, form = df$V, xlab = "V", ylab = "Simulated Standardized Residuals", asFactor = F)
We do not see any violation of the uniformity and homogeneity of the
distribution of the simulated residuals for our different categorical
predictors.
plotResiduals(sim.residuals, form = df$rec_voc, xlab = "rec_voc", ylab = "Simulated Standardized Residuals")plotResiduals(sim.residuals, form = df$phono_mem, xlab = "phono_mem", ylab = "Simulated Standardized Residuals", asFactor = F)plotResiduals(sim.residuals, form = df$LoE, xlab = "LoE", ylab = "Simulated Standardized Residuals")plotResiduals(sim.residuals, form = df$age, xlab = "age", ylab = "Simulated Standardized Residuals")We also do not see any violation of the uniformity and homogeneity of the distribution of the simulated residuals for our different numerical predictors.
The key assumptions of our mixed-effects logistic regression are satisfied. We can therefore reasonably trust the outputs of the model, which we analyze in the next section.
We initially considered a number of fixed effects to be assessed with our statistical model.
There are a number of main effects:
There are also a number of interactions:
We formulate precise oriented hypotheses to be tested for the main effects (all other things being equal):
In the next paragraphs, one should note that the graphical representations report probabilities of correct repetition of a nonword, i.e., in the ‘numerical space’ of the predicted variable. However, the statistical tests of the various effects consider the non-transformed estimates and confidence intervals, i.e. in the numerical space of the linear combination of predictors / independent variables.
We rely on the emmeans package and the computation of estimated marginal means of either levels of factors or linear trends. To adjust the p-values in the case of multiple tests, we rely on a multivariate t distribution (adjust = “mvt”) with the same covariance structure as the estimates, as it is the closest to an exact adjustment (see https://cran.r-project.org/web/packages/emmeans/vignettes/confidence-intervals.html).
We can first investigate the higher-level predictors of the model, that is the different interactions.
The first two interactions are between one categorical variable and the continuous variable LoE (occurrence_l : LoE and branching_onset : LoE)
We first define a range of variation for the values of LoE
list.LoE <- list(LoE = seq(min(df$LoE), max(df$LoE), by = 1))For occurrence_l : LoE:
plot_model(model.final, type = "emm", terms = c("LoE [all]", "occurrence_l"))emtrends(model.final, pairwise ~ occurrence_l | LoE, var= "LoE", adjust = "mvt", infer = c(T,T))$emtrends
LoE = 180:
occurrence_l LoE.trend SE df asymp.LCL asymp.UCL z.ratio p.value
coda 0.000987 0.00347 Inf -0.00582 0.00779 0.284 0.7761
final 0.004485 0.00342 Inf -0.00222 0.01119 1.311 0.1899
other 0.000346 0.00144 Inf -0.00248 0.00317 0.240 0.8103
Results are averaged over the levels of: branching_onset, V
Confidence level used: 0.95
$contrasts
LoE = 180:
contrast estimate SE df asymp.LCL asymp.UCL z.ratio p.value
coda - final -0.003498 0.00449 Inf -0.01389 0.00690 -0.780 0.7039
coda - other 0.000641 0.00290 Inf -0.00607 0.00735 0.221 0.9719
final - other 0.004139 0.00346 Inf -0.00389 0.01217 1.195 0.4414
Results are averaged over the levels of: branching_onset, V
Confidence level used: 0.95
Conf-level adjustment: mvt method for 3 estimates
P value adjustment: mvt method for 3 tests
The figures show that the slopes for the levels other and coda are quite close, and are visually quite different from the slope for the third level final. The statistical tests, however, fail to detect a significant difference. The reason for that is possibly the large standard errors for coda and final, which are due in turn to the small number of nonwords which relate to these two levels (4 and 7 respectively, to be compared to 60 nonwords without l in coda or final position):
my_table <- df %>%
select(nonword, occurrence_l) %>%
unique() %>%
with(., table(occurrence_l)) %>%
as.data.frame()
colnames(my_table) <- c("Occurrence of l", "# nonwords")
my_table Occurrence of l # nonwords
1 coda 4
2 final 7
3 other 60
For branching_onset : LoE:
plot_model(model.final, type = "emm", terms = c("LoE [all]", "branching_onset"))emtrends(model.final, pairwise ~ branching_onset | LoE, var= "LoE", adjust = "mvt", infer = c(T,T))$emtrends
LoE = 180:
branching_onset LoE.trend SE df asymp.LCL asymp.UCL z.ratio p.value
0 0.002483 0.00189 Inf -0.00122 0.00618 1.316 0.1883
1 0.000658 0.00201 Inf -0.00329 0.00460 0.327 0.7439
2 0.002678 0.00308 Inf -0.00335 0.00871 0.870 0.3842
Results are averaged over the levels of: occurrence_l, V
Confidence level used: 0.95
$contrasts
LoE = 180:
contrast estimate SE df asymp.LCL asymp.UCL z.ratio p.value
branching_onset0 - branching_onset1 0.001825 0.00111 Inf -0.000745 0.00440 1.637 0.2162
branching_onset0 - branching_onset2 -0.000195 0.00256 Inf -0.006101 0.00571 -0.076 0.9966
branching_onset1 - branching_onset2 -0.002020 0.00254 Inf -0.007881 0.00384 -0.795 0.6938
Results are averaged over the levels of: occurrence_l, V
Confidence level used: 0.95
Conf-level adjustment: mvt method for 3 estimates
P value adjustment: mvt method for 3 tests
While the figures show that the slopes for LoE differ according to the value of branching_onset, the statistical tests reveal that these slopes are actually not significantly different from each other.
We then have 4 interactions which consist of two continuous variables : rec_voc : LoE, age : phono_mem, months : rec_voc and phono_mem : rec_voc
For rec_voc : LoE:
plot_model(model.final, type = "emm", terms = c("LoE [all]", "rec_voc"))
We can see that the different curves are nearly parallel, which
corroborates the lack of significant interaction between the two
variables.
Knowing that the interaction between rec_voc and LoE corresponds to the change in the slope of LoE for every one unit increase in rec_voc (or vice-versa), we can assess the significance of this interaction by looking at the contrast / difference between two slopes of rec_voc separated by one unit increase in LoE (the result will be independent from the choice of the two values separated by one unit).
list.LoE.red <- list(LoE = seq(median(df$LoE) - 0.5, median(df$LoE) + 0.5, by = 1))
emtrends(model.final, pairwise ~ LoE, var = "rec_voc", at = list.LoE.red, adjust = "mvt", infer = c(T, T))$emtrends
LoE rec_voc.trend SE df asymp.LCL asymp.UCL z.ratio p.value
180 0.0721 0.0165 Inf 0.0398 0.104 4.373 <.0001
180 0.0722 0.0165 Inf 0.0398 0.105 4.368 <.0001
Results are averaged over the levels of: occurrence_l, branching_onset, V
Confidence level used: 0.95
$contrasts
contrast estimate SE df asymp.LCL asymp.UCL z.ratio p.value
LoE179.5 - LoE180.5 -0.000127 0.000193 Inf -0.000506 0.000253 -0.655 0.5126
Results are averaged over the levels of: occurrence_l, branching_onset, V
Confidence level used: 0.95
We find that the estimate for the interaction is not significantly different from 0 - the p-value is much larger than 0.05.
For age : phono_mem:
plot_model(model.final, type = "emm", terms = c("age [all]", "phono_mem"))list.phono_mem.red <- list(phono_mem = seq(median(df$phono_mem) - 0.5, median(df$phono_mem) + 0.5, by = 1))
emtrends(model.final, pairwise ~ phono_mem, var = "age", at = list.phono_mem.red, adjust = "mvt", infer = c(T, T))$emtrends
phono_mem age.trend SE df asymp.LCL asymp.UCL z.ratio p.value
3.5 0.0131 0.0103 Inf -0.00715 0.0333 1.267 0.2052
4.5 0.0201 0.0127 Inf -0.00493 0.0450 1.573 0.1157
Results are averaged over the levels of: occurrence_l, branching_onset, V
Confidence level used: 0.95
$contrasts
contrast estimate SE df asymp.LCL asymp.UCL z.ratio p.value
phono_mem3.5 - phono_mem4.5 -0.00699 0.012 Inf -0.0306 0.0166 -0.580 0.5617
Results are averaged over the levels of: occurrence_l, branching_onset, V
Confidence level used: 0.95
Once again, the p-value is much larger than 0.05.
For age : rec_voc:
plot_model(model.final, type = "emm", terms = c("age [all]", "rec_voc"))list.rec_voc.red <- list(rec_voc = seq(median(df$rec_voc) - 0.5, median(df$rec_voc) + 0.5, by = 1))
emtrends(model.final, pairwise ~ rec_voc, var = "age", at = list.rec_voc.red, adjust = "mvt", infer = c(T, T))$emtrends
rec_voc age.trend SE df asymp.LCL asymp.UCL z.ratio p.value
19.5 0.0153 0.00968 Inf -0.00365 0.0343 1.583 0.1135
20.5 0.0159 0.00954 Inf -0.00275 0.0346 1.671 0.0947
Results are averaged over the levels of: occurrence_l, branching_onset, V
Confidence level used: 0.95
$contrasts
contrast estimate SE df asymp.LCL asymp.UCL z.ratio p.value
rec_voc19.5 - rec_voc20.5 -0.000624 0.00148 Inf -0.00352 0.00227 -0.423 0.6726
Results are averaged over the levels of: occurrence_l, branching_onset, V
Confidence level used: 0.95
The p-value for the interaction is close to 1.
For phono_mem : rec_voc:
plot_model(model.final, type = "emm", terms = c("phono_mem", "rec_voc"))list.phono_mem.red <- list(phono_mem = seq(median(df$phono_mem) - 0.5, median(df$phono_mem) + 0.5, by = 1))
emtrends(model.final, pairwise ~ phono_mem, var = "rec_voc", at = list.phono_mem.red, adjust = "mvt", infer = c(T, T))$emtrends
phono_mem rec_voc.trend SE df asymp.LCL asymp.UCL z.ratio p.value
3.5 0.0738 0.0185 Inf 0.0375 0.110 3.983 0.0001
4.5 0.0691 0.0204 Inf 0.0292 0.109 3.393 0.0007
Results are averaged over the levels of: occurrence_l, branching_onset, V
Confidence level used: 0.95
$contrasts
contrast estimate SE df asymp.LCL asymp.UCL z.ratio p.value
phono_mem3.5 - phono_mem4.5 0.00472 0.0205 Inf -0.0355 0.045 0.230 0.8181
Results are averaged over the levels of: occurrence_l, branching_onset, V
Confidence level used: 0.95
The p-value is once again much larger than 0.05.
Our investigation of the interactions there shows that none of them is statistically significant. A possible option would then be to simplify the model by dropping these interactions. However, this amounts to model selection (for the fixed effects), which is warned against by a number of prominent statisticians. In what follows, we are therefore going to assess main effects despite the presence of interactions in the model.
Given that none of the interactions we thought could be significant appears to be so, we can focus on the main effects in our model, i.e. the effects of the item-related categorical variables occurrence_l, branching_onset, and V, and the subject-related continous variables rec_voc, phono_mem, LoE and age.
For occurrence_l:
plot_model(model.final, type = "emm", terms = "occurrence_l")summary(emmeans(model.final, pairwise ~ occurrence_l, adjust = "mvt", side = "<"), infer = c(TRUE, TRUE), null = 0)$contrasts contrast estimate SE df asymp.LCL asymp.UCL z.ratio p.value
coda - final -0.406 0.517 Inf -Inf 0.669 -0.786 0.4699
coda - other -1.118 0.389 Inf -Inf -0.309 -2.874 0.0059
final - other -0.711 0.362 Inf -Inf 0.042 -1.963 0.0647
Results are averaged over the levels of: branching_onset, V
Results are given on the log odds ratio (not the response) scale.
Confidence level used: 0.95
Conf-level adjustment: mvt method for 3 estimates
P value adjustment: mvt method for 3 tests
P values are left-tailed
We set the parameter side to reflect our hypothesis that a nonword gets easier to repeat when shifting from l in coda position to l in final position to another structure.
We find that:
For branching_onset:
plot_model(model.final, type = "emm", terms = "branching_onset")summary(emmeans(model.final, pairwise ~ branching_onset, adjust = "mvt", side = ">"), infer = c(TRUE, TRUE), null = 0)$contrasts contrast estimate SE df asymp.LCL asymp.UCL z.ratio p.value
branching_onset0 - branching_onset1 0.877 0.176 Inf 0.521 Inf 4.974 <.0001
branching_onset0 - branching_onset2 1.989 0.481 Inf 1.018 Inf 4.133 <.0001
branching_onset1 - branching_onset2 1.113 0.488 Inf 0.128 Inf 2.279 0.0266
Results are averaged over the levels of: occurrence_l, V
Results are given on the log odds ratio (not the response) scale.
Confidence level used: 0.95
Conf-level adjustment: mvt method for 3 estimates
P value adjustment: mvt method for 3 tests
P values are right-tailed
We set the parameter side to reflect our hypothesis that the more branching onsets in a nonword, the more difficult it is to repeat.
We observe that all the contrasts are significant, and that therefore:
For V:
plot_model(model.final, type = "emm", terms = "V")summary(emmeans(model.final, pairwise ~ V, adjust = "mvt", side = ">"), infer = c(TRUE, TRUE), null = 0)$contrasts contrast estimate SE df asymp.LCL asymp.UCL z.ratio p.value
V1 - V2 0.144 0.203 Inf -0.2792 Inf 0.707 0.4905
V1 - V3 0.616 0.208 Inf 0.1825 Inf 2.956 0.0044
V2 - V3 0.472 0.206 Inf 0.0427 Inf 2.287 0.0303
Results are averaged over the levels of: occurrence_l, branching_onset
Results are given on the log odds ratio (not the response) scale.
Confidence level used: 0.95
Conf-level adjustment: mvt method for 3 estimates
P value adjustment: mvt method for 3 tests
P values are right-tailed
The parameter side corresponds to the hypothesis that the less vowels in a nonword, the easier it is to repeat.
We find two significant differences: a nonword with a single vowel is easier to repeat than a nonword with 3 vowels (p = 0.0044), and a nonword with 2 vowels is easier to repeat than a nonword with 3 vowels (p = 0.0303)
For rec_voc:
plot_model(model.final, type = "emm", terms = "rec_voc [all]")summary(emtrends(model.final, ~ rec_voc, var = "rec_voc", adjust = "mvt", side = ">"), infer = c(TRUE, TRUE), null = 0) rec_voc rec_voc.trend SE df asymp.LCL asymp.UCL z.ratio p.value
19.7 0.0722 0.0165 Inf 0.045 Inf 4.371 <.0001
Results are averaged over the levels of: occurrence_l, branching_onset, V
Confidence level used: 0.95
P values are right-tailed
We set the parameter side to reflect our hypothesis that the larger a child’s French receptive vocabulary, higher the probability of correct repetition.
We observe a significant effect of rec_voc (p < 0.0001): the size of a child’s receptive vocabulary significantly and positively impact its ability to correctly repeat nonwords.
For phono_mem:
plot_model(model.final, type = "emm", terms = "phono_mem")summary(emtrends(model.final, ~ phono_mem, var = "phono_mem", adjust = "mvt", side = ">"), infer = c(TRUE, TRUE), null = 0) phono_mem phono_mem.trend SE df asymp.LCL asymp.UCL z.ratio p.value
3.84 0.127 0.12 Inf -0.071 Inf 1.054 0.1459
Results are averaged over the levels of: occurrence_l, branching_onset, V
Confidence level used: 0.95
P values are right-tailed
We set the parameter side to reflect our hypothesis that the larger a child’s phonological memory, the higher the probability of correct repetition.
We do not observe a significant effect of phono_mem.
For LoE:
plot_model(model.final, type = "emm", terms = "LoE [all]")summary(emtrends(model.final, ~ LoE, var = "LoE", adjust = "mvt", side = ">"), infer = c(TRUE, TRUE), null = 0) LoE LoE.trend SE df asymp.LCL asymp.UCL z.ratio p.value
180 0.00194 0.00203 Inf -0.0014 Inf 0.957 0.1694
Results are averaged over the levels of: occurrence_l, branching_onset, V
Confidence level used: 0.95
P values are right-tailed
We set the parameter side to reflect our hypothesis that the longer the exposure to French, the higher the probability of correct repetition.
We do not observe a significant effect of LoE on the probability of correct repetition.
For age:
plot_model(model.final, type = "emm", terms = "age [all]")summary(emtrends(model.final, ~ age, var = "age", adjust = "mvt", side = ">"), infer = c(TRUE, TRUE), null = 0) age age.trend SE df asymp.LCL asymp.UCL z.ratio p.value
89.6 0.0154 0.00964 Inf -0.000422 Inf 1.601 0.0547
Results are averaged over the levels of: occurrence_l, branching_onset, V
Confidence level used: 0.95
P values are right-tailed
We set the parameter side to reflect our hypothesis that the older a child is, the higher the probability of correct repetition.
We do not observe a significant effect of age, although we are very close to significance (p = 0.0547).
All our hypotheses were not confirmed. We found the following results:
Additionally, we observed two statistical tendencies:
We did not observe effects of phono_mem or of LoE. We also did not find a significant difference between l appearing in internal coda position in nonwords and l appearing in final position.
Our analysis overall proves to be quite simple, in the sense that we did not observe any significant interaction.
p1 <- plot_model(model.final, type = "emm", terms = "occurrence_l", show.values = TRUE, value.offset = .3) +
labs(title = "Effect of occurence_l", x = "Location of /l/", y = "Predicted probability of correct repetition")
p2 <- plot_model(model.final, type = "emm", terms = "branching_onset", show.values = TRUE, value.offset = .3) +
labs(title = "Effect of branching_onset", x = "Number of branching onsets", y = "Predicted probability of correct repetition")
p3 <- plot_model(model.final, type = "emm", terms = "V", show.values = TRUE, value.offset = .3) +
labs(title = "Effect of V", x = "Number of vowels", y = "Predicted probability of correct repetition")
p4 <- plot_model(model.final, type = "emm", terms = "rec_voc [all]", show.values = TRUE, value.offset = .3) +
labs(title = "Effect of rec_voc", x = "Size of the receptive vocabulary", y = "Predicted probability of correct repetition")
p5 <- plot_model(model.final, type = "emm", terms = "age [all]", show.values = TRUE, value.offset = .3) +
labs(title = "Effect of age", x = "Age", y = "Predicted probability of correct repetition")
grid.arrange(p1, p2, p3, p4, p5, ncol = 3)While our analysis is centered on fixed effects, it is interesting to pay attention to the distribution of values of the different random effects in our model.
p <- plot_model(model.final, type="re", sort.est="sort.all", grid=F, transform = NULL)The most interesting source of information is likely the distribution of the random intercepts of the different children’s L1:
p[[5]] + scale_color_grey()
We can observe that the distribution of values suggests that being
bilingual before learning French helps with the repetition task, since
all but one case of bilingualism correspond to positive intercepts,
i.e., a facilitatory effect for the task (the negative intercept for
Russian/Chechen is also close to 0).
We can then display the sorted distribution of values of the random intercepts for nonwords:
p[[1]] + scale_color_grey()There is no striking element in the distribution observed.
Finally, we can display, although they are difficult to interpret, the values of the random effects for subject and the three levels of the variable occurrence_l:
grid.arrange(p[[4]] + scale_color_grey(), p[[3]] + scale_color_grey(), p[[2]] + scale_color_grey(), ncol = 2)
We can observe that the variance of the values is larger when l appear
in internal coda position or in final position, as it appears explicitly
in the summary of the model:
summary(model.final)Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod']
Family: binomial ( logit )
Formula: repetition ~ occurrence_l + branching_onset + V + scale(LoE) + scale(age) + scale(rec_voc) + scale(phono_mem) + occurrence_l:scale(LoE) + branching_onset:scale(LoE) + scale(rec_voc):scale(LoE) + scale(age):scale(phono_mem) + scale(age):scale(rec_voc) + scale(rec_voc):scale(phono_mem) + (0 + occurrence_l | subject) + (1 | L1) + (1 | nonword)
Data: df
Control: ctrl
AIC BIC logLik deviance df.resid
4043.5 4216.1 -1994.8 3989.5 4375
Scaled residuals:
Min 1Q Median 3Q Max
-5.6570 0.1800 0.3274 0.4927 2.3401
Random effects:
Groups Name Variance Std.Dev. Corr
nonword (Intercept) 0.3337 0.5777
subject occurrence_lcoda 2.3767 1.5417
occurrence_lfinal 2.7251 1.6508 0.08
occurrence_lother 0.2893 0.5379 0.84 -0.09
L1 (Intercept) 0.1507 0.3882
Number of obs: 4402, groups: nonword, 71; subject, 62; L1, 18
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.20941 0.44306 2.730 0.00634 **
occurrence_lfinal 0.40631 0.51722 0.786 0.43212
occurrence_lother 1.11752 0.38884 2.874 0.00405 **
branching_onset1 -0.87664 0.17626 -4.974 0.000000657 ***
branching_onset2 -1.98915 0.48126 -4.133 0.000035771 ***
V2 -0.14396 0.20348 -0.707 0.47926
V3 -0.61557 0.20827 -2.956 0.00312 **
scale(LoE) 0.12135 0.26340 0.461 0.64500
scale(age) 0.15426 0.09635 1.601 0.10936
scale(rec_voc) 0.46160 0.10561 4.371 0.000012382 ***
scale(phono_mem) 0.09446 0.08959 1.054 0.29172
occurrence_lfinal:scale(LoE) 0.27739 0.35571 0.780 0.43550
occurrence_lother:scale(LoE) -0.05084 0.22958 -0.221 0.82475
branching_onset1:scale(LoE) -0.14471 0.08839 -1.637 0.10159
branching_onset2:scale(LoE) 0.01546 0.20311 0.076 0.93934
scale(LoE):scale(rec_voc) 0.06426 0.09813 0.655 0.51260
scale(age):scale(phono_mem) 0.05203 0.08966 0.580 0.56171
scale(age):scale(rec_voc) 0.03987 0.09434 0.423 0.67259
scale(rec_voc):scale(phono_mem) -0.02249 0.09781 -0.230 0.81811
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
We load an additional table with participant-based rather than item-based data.
df_add_data <- read.table("additional_data.txt", header = TRUE, sep = "\t") %>% as_tibble()We also load data for monolinguals for a comparison between them and bilinguals:
df_L1 <- read.table("Data_Mo_TD_Mo_SLI_all_items.txt", header = TRUE, sep = "\t") %>%
as_tibble() %>%
mutate(repetition = as.character(repetition), # To allow future binding of the corresponding column with the one from df
group = str_replace_all(group, "MoSLI", "MoDLD")) # DLD was previously referred to as DLDWe focused finally on the production of specific structures in nonwords, namely l in non-final coda position, l in final position and l in branching onsets, regardless of whether other parts of the nonword are fine.
We first prepare the data for a boxplot:
df_struct <- df_add_data %>%
select(-PCC, -PVC, -NWR_pc_Total) %>%
rename(`Coda /l/` = Coda_l_pc, `Branching onset` = Branching_onset_pc, `Final /l/` = Final_l_pc) %>%
pivot_longer(cols = 2:4, names_to = "struct", values_to = "pc_correct_rep") %>%
mutate(struct = as.factor(struct))Then we draw the boxplot:
df_struct %>%
ggplot(aes(x = struct, y = pc_correct_rep, fill = struct)) +
geom_boxplot(outlier.size = 3, show.legend = FALSE, colour = "black") +
scale_fill_grey() +
stat_summary(fun = mean, geom = "point", shape = 4, size = 4, stroke = 2, show.legend = FALSE) +
geom_jitter(shape = 16, size = 1, position = position_jitter(0.075), show.legend = FALSE) +
theme_classic() +
ylab("Percentage of correct repetition") + xlab("") +
scale_y_continuous(labels = scales::percent, breaks = seq(0.0, 1.0, 0.1))The average values for the three distributions differ, at least the average for branching onsets seems higher than the other two. We can assess whether there are significant differences or not.
We first compute the average values explicitly:
df_struct %>% group_by(struct) %>% summarize(mean = mean(pc_correct_rep))# A tibble: 3 x 2
struct mean
<fct> <dbl>
1 Branching onset 0.847
2 Coda /l/ 0.754
3 Final /l/ 0.726
We assess whether the distributions of percentages of correct repetition are normally distributed:
df_struct %>%
ggplot(aes(sample = pc_correct_rep)) +
stat_qq() +
stat_qq_line() +
facet_grid(. ~ struct)The distributions do not look normal, and we should therefore consider a Kruskal-Wallis test:
df_struct %>% kruskal.test(pc_correct_rep ~ struct, data = .)
Kruskal-Wallis rank sum test
data: pc_correct_rep by struct
Kruskal-Wallis chi-squared = 1.5272, df = 2, p-value = 0.466
There is no significant difference between the average values for the three distributions.
We can also observe that the variances of the three distributions look quite different. Is that indeed the case, i.e., are hey significantly different from each other?
We first compute the variances explicitly:
df_struct %>% group_by(struct) %>% summarize(var = var(pc_correct_rep))# A tibble: 3 x 2
struct var
<fct> <dbl>
1 Branching onset 0.0123
2 Coda /l/ 0.0953
3 Final /l/ 0.0820
Given the non-normality of teh distributions, we consider the Fligner-Killeen non-parametric test of homogeneity of variances:
df_struct %>% fligner.test(pc_correct_rep ~ struct, data = .)
Fligner-Killeen test of homogeneity of variances
data: pc_correct_rep by struct
Fligner-Killeen:med chi-squared = 43.242, df = 2, p-value = 0.0000000004074
At least two variances are significantly different from each other. There are no post-hoc tests for the Fligner-Killeen test, but we can apply it to each pair of variances.
df_struct %>% filter(struct != "Branching onset") %>% fligner.test(pc_correct_rep ~ struct, data = .)
Fligner-Killeen test of homogeneity of variances
data: pc_correct_rep by struct
Fligner-Killeen:med chi-squared = 3.4613, df = 1, p-value = 0.06282
df_struct %>% filter(struct != "Final /l/") %>% fligner.test(pc_correct_rep ~ struct, data = .)
Fligner-Killeen test of homogeneity of variances
data: pc_correct_rep by struct
Fligner-Killeen:med chi-squared = 45.397, df = 1, p-value = 0.00000000001608
df_struct %>% filter(struct != "Coda /l/") %>% fligner.test(pc_correct_rep ~ struct, data = .)
Fligner-Killeen test of homogeneity of variances
data: pc_correct_rep by struct
Fligner-Killeen:med chi-squared = 27.055, df = 1, p-value = 0.0000001977
The problem here is that we need to adjust the p-values for multiple comparisons. We apply a Holm-Bonferroni correction:
p_bo <- df_struct %>% filter(struct != "Branching onset") %>% fligner.test(pc_correct_rep ~ struct, data = .) %>% .$p.value
p_final_l <- df_struct %>% filter(struct != "Final /l/") %>% fligner.test(pc_correct_rep ~ struct, data = .) %>% .$p.value
p_coda_l <- df_struct %>% filter(struct != "Coda /l/") %>% fligner.test(pc_correct_rep ~ struct, data = .) %>% .$p.value
p.adjust(c(p_bo, p_final_l, p_coda_l), method = "holm", n = 3)[1] 0.06281975368431658 0.00000000004825425 0.00000039549925356
We find that the variance for the distribution of percentages of correct repetition for the branching onsets is higher than the variance for the distribution of percentages of correct repetition for l in final (p < 0.001). We also find that the variance for the distribution of percentages of correct repetition for the branching onsets is higher than the variance for the distribution of percentages of correct repetition for l in coda (p < 0.001).
We first prepare the data for the figure:
df_pvc_pcc <- df_add_data %>%
select(Participant, PCC, PVC) %>%
pivot_longer(cols = PVC:PCC, names_to = "type", values_to = "score")We then draw a boxplot to visually assess the situation:
df_pvc_pcc %>%
mutate(type = as.factor(type),
score = score / 100) %>%
ggplot(aes(x = type, y = score, fill = type)) +
geom_boxplot(outlier.size = 3, show.legend = FALSE, colour = "black") +
scale_fill_grey() +
stat_summary(fun = mean, geom = "point", shape = 4, size = 4, stroke = 2, show.legend = FALSE) +
geom_jitter(shape = 16, size = 1, position = position_jitter(0.075), show.legend = FALSE) +
theme_classic() +
ylab("Percentage (of Vowels or Consonants) Correct") + xlab("") +
scale_y_continuous(labels = scales::percent, breaks = seq(0.7, 1.0, 0.05)) +
scale_x_discrete(labels = c('Consonants', 'Vowels'))We can observe that the variances of the two distributions seem quite different from each other, but that the mean and median values seem close. We can compute them:
df_pvc_pcc %>% group_by(type) %>% summarize(var = var(score), mean = mean(score), median = median(score))# A tibble: 2 x 4
type var mean median
<chr> <dbl> <dbl> <dbl>
1 PCC 21.7 92.6 94
2 PVC 12.6 94.6 95
We can assess differences with inferential tests. To choose appropriate tests, we assess the normality of the distributions for PVC and PCC with a quantile-quantile plot:
df_pvc_pcc %>%
mutate(type = as.factor(type),
score = score / 100) %>%
ggplot(aes(sample = score)) +
stat_qq() +
stat_qq_line() +
facet_grid(. ~ type)The distributions look rather far from normality. We therefore consider a Mann-Whitney U test and a Fligner-Killeen test of homogeneity of variances:
df_pvc_pcc %>% wilcox.test(score ~ type, data = .)
Wilcoxon rank sum test with continuity correction
data: score by type
W = 1462, p-value = 0.02104
alternative hypothesis: true location shift is not equal to 0
The two medians are significantly different from each other (p = 0.0210)
fligner.test(score ~ type, data = df_pvc_pcc)
Fligner-Killeen test of homogeneity of variances
data: score by type
Fligner-Killeen:med chi-squared = 4.1552, df = 1, p-value = 0.04151
The variances also appear to be significantly different from each other (p = 0.0415).
We can display an overview of the data for monolinguals
df_L1 %>%
dfSummary() %>%
print(method = 'render', footnote = NA) | No | Variable | Stats / Values | Freqs (% of Valid) | Graph | Valid | Missing | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | subject [character] |
|
|
2059 (100.0%) | 0 (0.0%) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 2 | group [character] |
|
|
2059 (100.0%) | 0 (0.0%) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 3 | nonword [character] |
|
|
2059 (100.0%) | 0 (0.0%) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 4 | repetition [character] |
|
|
2059 (100.0%) | 0 (0.0%) |
We create a table with all the data for L1 and L2 participants:
df_all <- df %>%
dplyr::select(subject, nonword, repetition) %>%
mutate(group = "Bi") %>%
mutate(repetition = as.character(repetition)) %>% # To allow binding the corresponding column with the one from df_L1
rbind(df_L1) %>%
mutate(repetition = as.integer(repetition), # For now, we set repetition as an integer
group = as.factor(group))We can see that all the participants we consider have been evaluated with 71 nonwords:
df_all %>% group_by(group, subject) %>% summarize(nb_nonwords = n()) %>% print(n=100)# A tibble: 91 x 3
# Groups: group [3]
group subject nb_nonwords
<fct> <chr> <int>
1 Bi EANA-A1 71
2 Bi EANA-A2 71
3 Bi EANA-A3 71
4 Bi EANA-A4 71
5 Bi EANA-A5 71
6 Bi EANA-A6 71
7 Bi EANA-A7 71
8 Bi EANA-A8 71
9 Bi EANA-A9 71
10 Bi EANA-AL1 71
11 Bi EANA-AL2 71
12 Bi EANA-AL3 71
13 Bi EANA-AL4 71
14 Bi EANA-AL5 71
15 Bi EANA-AL6 71
16 Bi EANA-AR1 71
17 Bi EANA-AR2 71
18 Bi EANA-AR3 71
19 Bi EANA-E1 71
20 Bi EANA-EA1 71
21 Bi EANA-GA1 71
22 Bi EANA-I1 71
23 Bi EANA-I2 71
24 Bi EANA-I3 71
25 Bi EANA-I4 71
26 Bi EANA-I5 71
27 Bi EANA-IA1 71
28 Bi EANA-IA10 71
29 Bi EANA-IA11 71
30 Bi EANA-IA12 71
31 Bi EANA-IA2 71
32 Bi EANA-IA3 71
33 Bi EANA-IA4 71
34 Bi EANA-IA5 71
35 Bi EANA-IA6 71
36 Bi EANA-IA7 71
37 Bi EANA-IA8 71
38 Bi EANA-IA9 71
39 Bi EANA-J1 71
40 Bi EANA-J2 71
41 Bi EANA-K1 71
42 Bi EANA-M1 71
43 Bi EANA-P1 71
44 Bi EANA-P2 71
45 Bi EANA-P3 71
46 Bi EANA-P4 71
47 Bi EANA-P5 71
48 Bi EANA-P6 71
49 Bi EANA-P7 71
50 Bi EANA-PB1 71
51 Bi EANA-PB2 71
52 Bi EANA-PB3 71
53 Bi EANA-PB4 71
54 Bi EANA-R1 71
55 Bi EANA-R2 71
56 Bi EANA-R3 71
57 Bi EANA-R4 71
58 Bi EANA-R5 71
59 Bi EANA-RAR1 71
60 Bi EANA-RU 71
61 Bi EANA-RUT 71
62 Bi EANA-UR1 71
63 MoDLD ALL 71
64 MoDLD AMO 71
65 MoDLD AUC 71
66 MoDLD CEM 71
67 MoDLD CHD 71
68 MoDLD CLK 71
69 MoDLD CMU 71
70 MoDLD ELV 71
71 MoDLD GAV 71
72 MoDLD LIB 71
73 MoDLD LOP 71
74 MoDLD MAL 71
75 MoDLD MAM 71
76 MoDLD MOB 71
77 MoDLD NBE 71
78 MoDLD PEN 71
79 MoDLD VAM 71
80 MoTD ANB 71
81 MoTD KYP 71
82 MoTD LEB 71
83 MoTD LIT 71
84 MoTD MAD 71
85 MoTD MAS 71
86 MoTD MLI 71
87 MoTD MLM 71
88 MoTD OCS 71
89 MoTD THP 71
90 MoTD VAH 71
91 MoTD WYR 71
We can find how many participants we have in each group:
df_all %>% dplyr::select(group, subject) %>% unique() %>% group_by(group) %>% summarize(nb_participants = n())# A tibble: 3 x 2
group nb_participants
<fct> <int>
1 Bi 62
2 MoDLD 17
3 MoTD 12
We have many more bilingual participants, but this doesn’t prevent us from comparing the 3 groups.
We compute the average percentage of correct repetition per subject:
df_group <- df_all %>%
mutate(repetition = as.integer(repetition)) %>%
group_by(group, subject) %>%
summarize(av_pc_correct_repetition = mean(repetition)) %>%
ungroup()We then draw a boxplot:
df_group %>%
ggplot(aes(x = group, y = av_pc_correct_repetition, fill = group)) +
geom_boxplot(outlier.size = 3,show.legend = FALSE, colour = "black") +
scale_fill_grey() +
stat_summary(fun = mean, geom = "point", shape = 4, size = 4, stroke = 2, show.legend = FALSE) +
geom_jitter(shape = 16, size = 1, position = position_jitter(0.075), show.legend = FALSE) +
theme_classic() +
ylab("Percentage of correct repetition") + xlab("") +
scale_y_continuous(labels = scales::percent, breaks = seq(0.0, 1.0, 0.05))We have three groups of participants we want to compare: monolinguals with typical development, monolinguals with specific language impairment, and bilinguals
We’re going to use a mixed-effects logistic regression to account for the non-independence of observations, i.e., their grouping by subject and nonword. av_pc_correct_repetition is the dependent variable and group the only independent variable. We keep things simple by considering random intercepts for subject and nonword, and by not considering other predictors.
model.comp <- glmer(repetition ~ group + (1 | subject) + (1 | nonword),
data = df_all, control = ctrl, family = binomial(link = "logit"))The model converges without any problem.
In order to trust the outputs of our model, we need to make sure that the assumptions underlying a mixed-effects logistic regression are met. While these assumptions are not as clear as for linear regression, the following conditions matter: - The normality of the different distributions for the random effects - The correct distribution of the residuals of the model
p <- plot_model(model.comp, type = "diag", show.values = TRUE, value.offset = .3)
p$nonword + labs(title = "Random intercept for nonword") + theme_sjplot()p$subject + labs(title = "Random intercept for subject") + theme_sjplot()The distributions for the random effect are close to normal, no problem.
sim.residuals <- simulateResiduals(fittedModel = model.comp, n = 1000, refit = FALSE)sim.residuals %>% plotQQunif()plotResiduals(sim.residuals, form = df_all$group, xlab = "Group of subjects", ylab = "Simulated Standardized Residuals")The simulated residuals look very good. Additionally, the Levene test for the homogeneity of the variances of the three groups is non-significant, meaning the distributions are homoscedastic. We can therefore trust our model.
We can now analyze the impact of the group predictor:
Rather than the average percentage of correct repetition per group, we can display the marginal means estimated with the regression model, i.e., accounting for the random effects:
plot_model(model.comp, type = "emm", terms = "group", show.values = TRUE, value.offset = .3) +
labs(title = "Marginal estimated means for the different groups of subjects", x = "Group", y = "Predicted probability of correct repetition")The marginal means for the three groups are more precisely:
emmeans(model.comp, "group", type = "response") group prob SE df asymp.LCL asymp.UCL
Bi 0.831 0.0226 Inf 0.782 0.871
MoDLD 0.445 0.0651 Inf 0.323 0.573
MoTD 0.941 0.0181 Inf 0.894 0.968
Confidence level used: 0.95
Intervals are back-transformed from the logit scale
We can assess whether differences are significant:
summary(emmeans(model.comp, pairwise ~ group, adjust = "mvt"), infer = c(TRUE, TRUE), null = 0)$contrasts contrast estimate SE df asymp.LCL asymp.UCL z.ratio p.value
Bi - MoDLD 1.81 0.279 Inf 1.16 2.47 6.499 <.0001
Bi - MoTD -1.18 0.338 Inf -1.97 -0.39 -3.482 0.0014
MoDLD - MoTD -2.99 0.399 Inf -3.92 -2.06 -7.495 <.0001
Results are given on the log odds ratio (not the response) scale.
Confidence level used: 0.95
Conf-level adjustment: mvt method for 3 estimates
P value adjustment: mvt method for 3 tests
We find significant differences for all pairs of groups, but bilingual subjects seem closer to monolinguals with typical development than to monolinguals with DLD. We should assess this further.
We can assess with a resampling method, more precisely a bootstrap test, whether the bilinguals’ average performance is significantly closer to the Mo-TD’s than to the Mo-DLD’s. Our alternative hypothesis is that bilinguals subjects are closer on average (in terms of performance during the NWR task) to monolinguals with TD than to monolinguals with DLD. The null hypothesis which we try to reject is that bilinguals subjects are at equidistance between Mo-TD and Mo-DLD, or closer to Mo-DLD.
We define a function which, given the distribution of by-subject repetition scores in our three groups, derives a resampled distribution and checks whether H0 is verified with this new distribution:
get_result_test <- function(i, df_subj) {
results <- df_subj %>% group_by(group) %>% slice_sample(prop = 1.0, replace = TRUE) %>% summarize(av_perf = mean(av_perf)) %>% ungroup()
v <- results %>% pull(av_perf)
names(v) <- results %>% pull(group)
test <- (v["Bi"] - v["MoDLD"]) <= (v["MoTD"] - v["Bi"])
return (test)
}We cal the prevision function 10,000 times to see how many times resampling leads to H0 being verified:
df_subj <- df_all %>% group_by(group, subject) %>% summarize(av_perf = mean(repetition)) %>% ungroup()
tests <- sapply(1:10000, get_result_test, df_subj = df_subj, simplify = T) %>% as.numeric()
sum(tests)[1] 40
The number of times H0 is verified is very low and amounts to the following percentage of the 10,000 attempts:
sum(tests) / length(tests)[1] 0.004
This percentage is equal to the rate of false positives, would we reject H0, and it is smaller than 0.05 (p = 0.004). We can therefore conclude that the bilinguals’ average performance is significantly closer to the Mo-TD’s than to the Mo-DLD’s.
We simply draw a boxplot to visually assess the situation:
df %>%
group_by(subject) %>%
summarize(rec_voc_zscore = mean(rec_voc_zscore)) %>%
ungroup() %>%
mutate(group = as.factor("participants")) %>%
ggplot(aes(x = group, y = rec_voc_zscore)) +
geom_boxplot(outlier.size = 3, fill="#A4A4A4", coef = 1.5) +
stat_summary(fun = mean, geom = "point", shape = 4, size = 4, stroke = 2) +
geom_jitter(shape = 16, size = 1, position = position_jitter(0.075)) +
theme_classic() +
ylab("Receptive vocabulary size (z score)") + xlab("") +
scale_y_continuous(breaks = seq(-30.0, 0.0, 5.0))